Skip to content

Conversation

@ChrisPaulBennett
Copy link
Contributor

@ChrisPaulBennett ChrisPaulBennett commented Mar 12, 2025

This apart of 3 pull requests for adding CPU time and Max RSS analysis to the Cylc UI.

This adds the Max RSS and CPU time (as measured by cgroups) to the table view, box plot and time series views.

This adds a python profiler script. This profiler will will be ran by cylc in the same crgroup as the cylc task. It will periodically poll cgroups and save data to a file. Cylc will then store these values in the sql db file.

Linked to;
cylc/cylc-ui#2100
cylc/cylc-uiserver#675

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@ChrisPaulBennett ChrisPaulBennett marked this pull request as draft March 12, 2025 09:19
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Changed the name of the profiler module.
Linting

Profiler sends KB instead of bytes

Time Series now working

CPU/Memory Logging working
@ChrisPaulBennett ChrisPaulBennett force-pushed the cylc_profiler branch 2 times, most recently from 4a6dbe8 to 30a7bb0 Compare April 1, 2025 15:31
ChrisPaulBennett and others added 19 commits April 2, 2025 09:34
Initial profiler implementation (non working)

Changed the name of the profiler module.
Linting

Profiler sends KB instead of bytes

Time Series now working

CPU/Memory Logging working

Adding profiler unit tests

updating tests

Fail gracefully if cgroups cannot be found

Revert "Fail gracefully if cgroups cannot be found"

This reverts commit 92e1e11c9b392b4742501d399f191f590814e95e.

Linting

Modifying unit tests
Linting
Changed the name of the profiler module.

Profiler sends KB instead of bytes

Time Series now working
* The COPYING file appears to have moved from `dist-info/COPYING` into
  `dist-info/licenses/COPYING`.
* Attempt to fix flaky test.
* Cut out shell profile files to omit some spurious stderr.
* Attempt to fix flaky test.
* Cut out shell profile files to omit some spurious stderr.
Revert "Fail gracefully if cgroups cannot be found"

This reverts commit 92e1e11c9b392b4742501d399f191f590814e95e.

Adding profiler unit tests

updating tests
@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 21, 2025

We seem to have agreement on the validity cgroups field(s) we're polling, we should be able to push forward with this quickly now.

Last couple of outstanding comments:

  • Remove global variables.
  • Backup the profiler output to the job.status file.
    • This PR is using the cylc.flow.send_messages interface to send the message back to the scheduler.
    • However, we should probably be using the record_messages interface (MB I think, sry).
    • This interface will additionally write the message to the job.status file.
    • This way, the message will not be lost if the scheduler is stopped at the time the message is sent, or if a network issue prevents the transmission of the message.
    • Note, this interface does not presently support a comms_timeout option. This isn't a biggie, we can drop this.

Then we can get Dave to ok the cgroup stuff and we're away...

@ChrisPaulBennett
Copy link
Contributor Author

ChrisPaulBennett commented Oct 21, 2025

* Pass the `Process` object into the `stop_profiler` method.

How do I do that? I couldn't see a way to do it. Since I'm not calling the function, its the registered function for sigkill
The use of globals is the only way I could see to get around. I'd love to get rid of them.

*EDIT. Sorry, I didn't see the pull request, I'll go through it now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants